Goto

Collaborating Authors

 Billings County



The author is dead, but what if they never lived? A reception experiment on Czech AI- and human-authored poetry

Marklová, Anna, Vinš, Ondřej, Vokáčová, Martina, Milička, Jiří

arXiv.org Artificial Intelligence

Large language models are increasingly capable of producing creative texts, yet most studies on AI-generated poetry focus on English -- a language that dominates training data. In this paper, we examine the perception of AI- and human-written Czech poetry. We ask if Czech native speakers are able to identify it and how they aesthetically judge it. Participants performed at chance level when guessing authorship (45.8\% correct on average), indicating that Czech AI-generated poems were largely indistinguishable from human-written ones. Aesthetic evaluations revealed a strong authorship bias: when participants believed a poem was AI-generated, they rated it as less favorably, even though AI poems were in fact rated equally or more favorably than human ones on average. The logistic regression model uncovered that the more the people liked a poem, the less probable was that they accurately assign the authorship. Familiarity with poetry or literary background had no effect on recognition accuracy. Our findings show that AI can convincingly produce poetry even in a morphologically complex, low-resource (with respect of the training data of AI models) Slavic language such as Czech. The results suggest that readers' beliefs about authorship and the aesthetic evaluation of the poem are interconnected.


Precipitation Downscaling with Spatiotemporal Video Diffusion

Neural Information Processing Systems

For dynamics, see Figure 1. Precipitation patterns are central to human and natural life. In a rapidly warming climate, reliable simulations of changing precipitation patterns can help adapt to climate change.



Precipitation Downscaling with Spatiotemporal Video Diffusion

Neural Information Processing Systems

For dynamics, see Figure 1. Precipitation patterns are central to human and natural life. In a rapidly warming climate, reliable simulations of changing precipitation patterns can help adapt to climate change.



Object Centric Concept Bottlenecks

Steinmann, David, Stammer, Wolfgang, Wüst, Antonia, Kersting, Kristian

arXiv.org Artificial Intelligence

Developing high-performing, yet interpretable models remains a critical challenge in modern AI. Concept-based models (CBMs) attempt to address this by extracting human-understandable concepts from a global encoding (e.g., image encoding) and then applying a linear classifier on the resulting concept activations, enabling transparent decision-making. However, their reliance on holistic image encodings limits their expressiveness in object-centric real-world settings and thus hinders their ability to solve complex vision tasks beyond single-label classification. To tackle these challenges, we introduce Object-Centric Concept Bottlenecks (OCB), a framework that combines the strengths of CBMs and pre-trained object-centric foundation models, boosting performance and interpretability. We evaluate OCB on complex image datasets and conduct a comprehensive ablation study to analyze key components of the framework, such as strategies for aggregating object-concept encodings. The results show that OCB outperforms traditional CBMs and allows one to make interpretable decisions for complex visual tasks.


INF-3DP: Implicit Neural Fields for Collision-Free Multi-Axis 3D Printing

Qu, Jiasheng, Huang, Zhuo, Guo, Dezhao, Sun, Hailin, Lyu, Aoran, Dai, Chengkai, Yam, Yeung, Fang, Guoxin

arXiv.org Artificial Intelligence

We introduce a general, scalable computational framework for multi-axis 3D printing based on implicit neural fields (INFs) that unifies all stages of toolpath generation and global collision-free motion planning. In our pipeline, input models are represented as signed distance fields, with fabrication objectives such as support-free printing, surface finish quality, and extrusion control being directly encoded in the optimization of an implicit guidance field. This unified approach enables toolpath optimization across both surface and interior domains, allowing shell and infill paths to be generated via implicit field interpolation. The printing sequence and multi-axis motion are then jointly optimized over a continuous quaternion field. Our continuous formulation constructs the evolving printing object as a time-varying SDF, supporting differentiable global collision handling throughout INF-based motion planning. Compared to explicit-representation-based methods, INF-3DP achieves up to two orders of magnitude speedup and significantly reduces waypoint-to-surface error. We validate our framework on diverse, complex models and demonstrate its efficiency with physical fabrication experiments using a robot-assisted multi-axis system.


VideoPanda: Video Panoramic Diffusion with Multi-view Attention

Xie, Kevin, Sabour, Amirmojtaba, Huang, Jiahui, Paschalidou, Despoina, Klar, Greg, Iqbal, Umar, Fidler, Sanja, Zeng, Xiaohui

arXiv.org Artificial Intelligence

Both single-view video inputs were generated using existing video generation models (Brooks et al., 2024; Runway, 2024). Auto-regressive generation is applied to extend the video length. A BSTRACT High resolution panoramic video content is paramount for immersive experiences in Virtual Reality, but is non-trivial to collect as it requires specialized equipment and intricate camera setups. In this work, we introduce VideoPanda, a novel approach for synthesizing 360 videos conditioned on text or single-view video data. VideoPanda leverages multi-view attention layers to augment a video diffusion model, enabling it to generate consistent multi-view videos that can be combined into immersive panoramic content. VideoPanda is trained jointly using two conditions: text-only and single-view video, and supports autoregressive generation of long-videos. To overcome the computational burden of multi-view video generation, we randomly subsample the duration and camera views used during training and show that the model is able to gracefully generalize to generating more frames during inference. Extensive evaluations on both real-world and synthetic video datasets demonstrate that VideoPanda generates more realistic and coherent 360 panoramas across all input conditions compared to existing methods. First and second author contributed equally. To enable such experiences, it is essential to have access to high-quality and high-resolution panoramic videos. However, recording such videos is both expensive and time-consuming, as it requires intricate camera setups and specialized equipment.


Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Seo, Hoigi, Jeong, Wongi, Seo, Jae-sun, Chun, Se Young

arXiv.org Artificial Intelligence

Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance in generating high-quality images from textual prompts. Unlike denoising modules that rely on multiple iterative steps, text encoders require only a single forward pass to produce text embeddings. However, despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage, up to eight times more than denoising modules. To address this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models. Skrr exploits the inherent redundancy in transformer blocks by selectively skipping or reusing certain layers in a manner tailored for T2I tasks, thereby reducing memory consumption without compromising performance. Extensive experiments demonstrate that Skrr maintains image quality comparable to the original model even under high sparsity levels, outperforming existing blockwise pruning methods. Furthermore, Skrr achieves state-of-the-art memory efficiency while preserving performance across multiple evaluation metrics, including the FID, CLIP, DreamSim, and GenEval scores.